A linear-time algorithm for computing the multinomial stochastic complexity
نویسندگان
چکیده
The minimum description length (MDL) principle is a theoretically well-founded, general framework for performing model class selection and other types of statistical inference. This framework can be applied for tasks such as data clustering, density estimation and image denoising. The MDL principle is formalized via the so-called normalized maximum likelihood (NML) distribution, which has several desirable theoretical properties. The codelength of a given sample of data under the NML distribution is called the stochastic complexity, which is the basis for MDL model class selection. Unfortunately, in the case of discrete data, straightforward computation of the stochastic complexity requires exponential time with respect to the sample size, since the definition involves an exponential sum over all the possible data samples of a fixed size. As a main contribution of this paper, we derive an elegant recursion formula which allows efficient computation of the stochastic complexity in the case of n observations of a single multinomial random variable with K values. The time complexity of the new method is O(n+K) as opposed to O(n logn logK) obtained with the previous results. © 2007 Elsevier B.V. All rights reserved.
منابع مشابه
A stochastic model for project selection and scheduling problem
Resource limitation in zero time may cause to some profitable projects not to be selected in project selection problem, thus simultaneous project portfolio selection and scheduling problem has received significant attention. In this study, budget, investment costs and earnings are considered to be stochastic. The objectives are maximizing net present values of selected projects and minimizing v...
متن کاملParallelizing Assignment Problem with DNA Strands
Background:Many problems of combinatorial optimization, which are solvable only in exponential time, are known to be Non-Deterministic Polynomial hard (NP-hard). With the advent of parallel machines, new opportunities have been emerged to develop the effective solutions for NP-hard problems. However, solving these problems in polynomial time needs massive parallel machines and ...
متن کاملMulti-period project portfolio selection under risk considerations and stochastic income
This paper deals with multi-period project portfolio selection problem. In this problem, the available budget is invested on the best portfolio of projects in each period such that the net profit is maximized. We also consider more realistic assumptions to cover wider range of applications than those reported in previous studies. A novel mathematical model is presented to solve the problem, con...
متن کاملSweep Line Algorithm for Convex Hull Revisited
Convex hull of some given points is the intersection of all convex sets containing them. It is used as primary structure in many other problems in computational geometry and other areas like image processing, model identification, geographical data systems, and triangular computation of a set of points and so on. Computing the convex hull of a set of point is one of the most fundamental and imp...
متن کاملComputing the Regret Table for Multinomial Data
Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as model selection or data clustering. In the case of multinomial data, computing the modern version of stochastic complexity, defined as the Normalized Maximum Likeli...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Inf. Process. Lett.
دوره 103 شماره
صفحات -
تاریخ انتشار 2007